Semantic Similarity of Short Texts

نویسنده

  • Aminul Islam
چکیده

This paper presents a method for measuring the semantic similarity of texts using a corpus based measure of semantic word similarity and a normalized and modified versions of the Longest Common Subsequence (LCS) string matching algorithm. Existing methods for computing text similarity have focused mainly on either large documents or individual words. In this paper, we focus on computing the similarity between two sentence or between two short paragraphs. The proposed method can be exploited in a variety of applications involving textual knowledge representation and knowledge discovery. Evaluation results on two different data sets show that our method outperforms several competing methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Semantic Similarity Measure for Pairs of Short Biological Texts

Finding the semantic similarity between biological texts, specially short texts, such as article abstracts and experiment descriptions of microarrays, may throw important information for experts in that field. To date, these methods have not been widely explored. In this paper, a comparison of different measures to calculate the semantic similarity of pairs of short biological texts is presente...

متن کامل

Corpus-based and Knowledge-based Measures of Text Semantic Similarity

This paper presents a method for measuring the semantic similarity of texts, using corpus-based and knowledge-based measures of similarity. Previous work on this problem has focused mainly on either large documents (e.g. text classification, information retrieval) or individual words (e.g. synonymy tests). Given that a large fraction of the information available today, on the Web and elsewhere,...

متن کامل

A Fast Approach for Semantic Similar Short Texts Retrieval

Retrieving semantic similar short texts is a crucial issue to many applications, e.g., web search, ads matching, questionanswer system, and so forth. Most of the traditional methods concentrate on how to improve the precision of the similarity measurement, while current real applications need to efficiently explore the top similar short texts semantically related to the query one. We address th...

متن کامل

Improving Semantic Similarity for Pairs of Short Biomedical Texts with Concept Definitions and Ontology Structure

Finding semantic similarity between short biomedical texts, such as article abstracts or experiment descriptions, may provide important information for health researchers. This paper presents a method for calculating text similarity in the biomedical context. The method implements a pairwise concept semantic similarity measure that uses concept definitions and ontology structure. The respective...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007